We assume that a binary function \(Y(t)\) is generated by a latent Gaussian function \(\eta(t)\), as follows:
\[Y_i(t) \sim Binomial(\frac{exp(\eta_i(t))}{1+exp(\eta_i(t))})\] \[\eta_i(t) = \beta_0(t)+b_i(t)\]
And this latent funciton can be approximated based on Karhunen–Loève expansion:
\[\eta_i(t) = f_0(t)+\sum_{k=1}^{K}\xi_k\phi_k(t)+\epsilon_i(t)\]
\[\eta_i(t) =f_0(t)+ \xi_{i1}sin(2\pi t)+\xi_{i2}cos(2\pi t)+\xi_{i3}sin(4\pi t)+\xi_{i4}cos(4\pi t)\]
where \(\boldsymbol{\xi}_{i} \sim N(0, \boldsymbol{\Gamma})\).
\[Y_i(t) \sim Binomial(\frac{exp(\eta_i(t))}{1+exp(\eta_i(t))})\]
In this simulation, we set \(f_0(t) = 0\), \(\boldsymbol{\Gamma}\) is a diagonal matrix with diagonal elements {1, 0.5, 0.25, 0.125}.
Question 1: in FPCA framework, \(\boldsymbol{\Gamma}\) is a diagonal matrix of eigenvalues. However in my inplemetation, the eigenvalues are much larger than true \(\boldsymbol{\Gamma}\). I am not sure why this happened. Does it has to do with the GLMM step, or decreased density when binning observations?
The entire time grid is binned into 100 small intervals with equal length. The cut is made on the order index of measurement time, but not the actually time itself. Each bin its labeled by its midpoint (5, 15, 25, …, 995).
Within each interval labeled as \(s\), we fit a local GLMM model:
\[logit(E(Y_i^s)) = \beta_0^s+b_i^s\] Then we use this series of models to estimate the latent Gaussian function. Each subject will have a different value of latent function at each bin.
Question 2: When fitting local GLMMs, it seems that the intervals are essentially independent. We do not consider the effect of neighborhood intervals on a specific interval. This means that correlation between repeated measures from the same subject is only introduced by the latent function?
Figure below is the “mean” latent function across subject, which essential is \(\beta_0^s\) from the series of mixed models.
Figure below shows the estimated latent function for subjects 1-4:
Here we fit FPCA on the estimated latent function from the previous step:
Follow-up on question 1: It looks like the PC function ranges are different from true PC functions. Maybe some of its variation is absorbed by the score, causing \(\boldsymbol{\Gamma}\) to be much greater than its true value?
Now let’s try to predict future outcomes based on observations \(\boldsymbol{Y}_m\) up to \(t_m\).
For in-sample prediction, we have observed the full outcome track, thus have estimated the full latent Gaussian function \(\eta(s)\) on binned grid. For now, we will take the quantities up to \(t_m\) and re-estimate the scores with empirical Bayes:
\[\hat{\boldsymbol{\xi}} = E(\boldsymbol{\xi}|\boldsymbol{Y}_m) = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}(\hat{\boldsymbol{\eta}}_m-\hat{\boldsymbol{f}}_0^m)\] Where:
And then, we use the FPCA model for point prediction on the entire binned grid:
\[\boldsymbol{\hat{\eta}} = \hat{\boldsymbol{f}}_0+\boldsymbol{\Phi}\boldsymbol{\hat{\xi}}\]
Here, \(\boldsymbol{\Phi}\) is the eigenfunctions on the entire binned grid. For \(\hat{\boldsymbol{\eta}}\), even though it is calculated for the entire binned grid, we will take values after \(t_m\).
Let
\[\boldsymbol{H} = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}\]
The variance of \(\hat{\boldsymbol{\eta}}\) can be estimated by:
\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}(\hat{\boldsymbol{\Gamma}}-\boldsymbol{H}\boldsymbol{\Phi_m}\hat{\boldsymbol{\Gamma}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\]
It should be a \(S \times S\) matrix; diagonals are variance at a specific time and off-diagonals are covariance between different times on binned grid.
Below we try to do that with a few subjects:
Now let’s say if we have a new subject \(\boldsymbol{Y}\) with observations up to \(t_m\) and \(t_m\) is in mth bin. It seems there are more than one ways to predict the following track. The first thing coming to mind was in fact to 1) refit local GLMMs to estimate a incomplete latent function; 2) refit local FPCA with this additional incomplete latent function track; and 3) repeat score estimation from in-sample procedure.
However, if we wish to make prediction without re-estimating any model (neither local GLMMs nor FPCA), we need to use the fitted FPCA model (specifically, \(\boldsymbol{\hat{f}}_0\), \(\boldsymbol{\Phi}\), \(\boldsymbol{\hat{\Gamma}}\), \(\hat{\sigma}_{\epsilon}\)) to estimate the scores for this new observations (out-of-sample scores). I can think of two ways of doing this:
Below I experimented with both procedures.
Let say on the binned grid, at each interval s, new sample has \(n_s\) observations, among which \(h_s\) are successes. \(h_s = \sum_{j=1}^{n_S} I(Y_{sj}=1)\), j is the index for observation in a specific interval.
Now let’s find the empirical bayes estimator of score \(E(\boldsymbol{\xi}|\boldsymbol{Y})\), which is not likely to have closed form solution.
\[E(\boldsymbol{\xi}|\boldsymbol{Y})=\int \boldsymbol{\xi} p(\boldsymbol{\xi}|\boldsymbol{Y})d\boldsymbol{\xi}\] \[p(\boldsymbol{\xi}|\boldsymbol{Y}) = \frac{p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]
Thus:
\[E(\boldsymbol{\xi}|\boldsymbol{Y})=\frac{\int \boldsymbol{\xi}p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]
We know that FPCA scores follow a multivariate normal distribution:
\[p(\boldsymbol{\xi}) = (2\pi)^{-K/2}det(\boldsymbol{\Gamma})^{-1/2}exp(-\boldsymbol{\xi^T \Gamma^{-1} \xi}/2)\]
And the binary outcome, conditional on score, follows a Binomial distribution:
\[\begin{aligned} p(\boldsymbol{Y}|\boldsymbol{\xi}) &= \prod_{s=1}^{m}p(\boldsymbol{Y}_s|\boldsymbol{\xi}) \\ & = \prod_{s=1}^{m} \binom{h_s}{n_s}p_s^{h_s}(1-p_s)^{n_s-h_s}\\ logit(p_s) & = \hat{\eta}_s = \hat{f}_{0}(s)+\phi(s)\boldsymbol{\xi} \end{aligned}\]
Follow up on question 2: Please note that here, we are assuming \(\boldsymbol{Y}_s\) is independent from each other given fPCA score. That is, all correlation across time within the same subject is introduced by underlying eigenfunctions.
It is very difficult to get a closed-form solution of this formula. Therefore, I tried to approximate the integrals with numeric integration (adaptIntegrate function from cubature package). In this case, I ran into the problem of computation time. Numeric integration was very, very slow, taking roughly an hour for just the denominator of one subject. The function value is also very small.
We maximize the same conditional likelihood of scores (conditional on observations):
\[p(\boldsymbol{\xi}|\boldsymbol{Y}) = \frac{p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]
But now we don’t need to calculated the marginal distribution of \(\boldsymbol{Y}\) anymore (denominator), since it would be constant wrt \(\boldsymbol{\xi}\):
\[\begin{aligned} p(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi}) \\ l(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto logp (\boldsymbol{Y}|\boldsymbol{\xi})+log p(\boldsymbol{\xi}) \end{aligned}\]
Plug in the conditional log-likelihood of \(\boldsymbol{Y}\) and marginal log-likelihood of \(\boldsymbol{\xi}\):
\[\begin{aligned} l(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto \sum_{s=1}^mh_s\eta_s-\sum_{s=1}^mn_slog(1+exp(\eta_s))-\boldsymbol{\xi^T\Gamma^{-1}\xi}/2\\ \frac{dl(\boldsymbol{\xi}|\boldsymbol{Y})}{d\boldsymbol{\xi}}& =\sum_{s=1}^mh_s\phi(s)-\sum_{s=1}^mn_s\frac{exp(\eta_s)}{1+exp(\eta_s)}\phi(s)-\boldsymbol{\xi^T\Gamma^{-1}} = 0 \end{aligned}\]
It is not easy to get a closed-form solution for the score equation above. We can try to get a numeric solution instead (multiroot function from rootSolve package).
It seems that \(E(\boldsymbol{\xi}|\boldsymbol{Y})\) and \(p(\boldsymbol{\xi}|\boldsymbol{Y})\) can be connected through Laplace approximation. Specifically, posterior mean estimated from Laplace method is the same as posterior mode. In this sense, the two methods above are essentially equivalent.
Below I have experimented with LaplaceDemon::LaplaceApproximation: https://search.r-project.org/CRAN/refmans/LaplacesDemon/html/LaplaceApproximation.html
This section evaluates the performance of out-of-sample prediction of individual tracks. Out reference method is Generalized Linear Mixed Models using Adaptive Gaussian Quadrature (GLMMadaptive), with a fixed effect of time and subject-level random intercept and slope:
\[g(E(Y_i)) = \beta_0+\beta_1t+b_{i0}+b_{i1}t\] We use two metrics for evaluation:
\[\frac{1}{N}\sum_{i=1}^N\sum_{s=m+1}^S(\hat{\eta}_{is}-\eta_{is})^2\]
This is a bit more complicated. The full computation time consists of two parts:
Table below include time spent only for prediction.
| Maximum observed time | fGFPCA | GLMMadaptive | fGFPCA | GLMMadaptive |
|---|---|---|---|---|
| 195 | 71.328 | 156.269 | 0.686 | 0.538 |
| 395 | 24.033 | 101.512 | 0.709 | 0.499 |
| 595 | 3.543 | 81.784 | 0.699 | 0.423 |
With an estimate of scores \(\hat{\boldsymbol{\xi}}\)
\[\boldsymbol{\hat{\eta}}=\boldsymbol{\hat{f_0}}+\boldsymbol{\Phi\hat{\xi}}\] \[\begin{aligned} \hat{\boldsymbol{\eta}}-\boldsymbol{\eta}&=\boldsymbol{\hat{f_0}}+\boldsymbol{\Phi\hat{\xi}}-\boldsymbol{\eta}\\ &= \boldsymbol{\hat{f_0}-f_0}+\boldsymbol{\Phi(\hat{\xi}-\xi)}-\boldsymbol{\epsilon} \end{aligned}\]
\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=Var(\boldsymbol{\hat{f_0}-f_0})+\boldsymbol{\Phi Var(\hat{\xi}-\xi) \Phi^T}+Var(\epsilon)\]
Below are some potential methods:
\[\boldsymbol{H} = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}\]
\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}(\hat{\boldsymbol{\Gamma}}-\boldsymbol{H}\boldsymbol{\Phi_m}\hat{\boldsymbol{\Gamma}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\]
\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}I(\hat{\boldsymbol{\xi}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\boldsymbol{I}_m \] - Standard deviation would change across subjects.
Here I explored the estimated scores, comparing them with the true scores from simulation setup. The figures actually didn’t look biased to me. Perhaps it is an artifact of small functional range. Need to think about that.